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1.  INTRODUCTION 

This  Final  /Report  covers  work  performed  at  the  Naval  Aerospace  Medical 
Research  Laboratory  during  Fiscal  Years  1988  and  1989.  The  Joint  Working 
Group  on  Drug  Dependent  Degradation  in  Military  Performance  (JWGD3  MILPERF) 
was  established  for  the  purpose  of  developing  and  testing  procedures  to  evalu¬ 
ate  the  effects  of  chemical  defense  pharmaceutical  agents  on  military  perfor¬ 
mance.  -.The  products  of  the  JWGD3  have  included  tests,  test  batteries,  task 
analysis  systems,  performance  modeling  tools,  simulators,  databases,  and 
archives  or.  human  performance  data.  These  tools,  although  specifically 
designed  for  chemical  defense  analyses,  have  been  used  to  measure  the  effects 
of  various  interventions  (or  stressors)  on  military  performance.  Examples  of 
such  interventions  and  stressors  are  pharmaceuticals  (including  prophylactics, 
treatment  drugV,  and  performance  enhancing  drugs),  (e.g.,  sleep  loss  and 
acceleration),  and  environmental  stressors  (e.g.,  extremes  of  temperature; 
reference  1 ) . 
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An  objective  of  this  laboratory's  participation  has  been  to  develop 
computational  models  of  human  performance  in  operational  tasks  and  in  labora¬ 
tory  perforsiance  tests.  The  purpose  has  been  to  develop  procedures  that  might 
be  used  to  genuralite  laboratory  measurements  of  human  performance,  such  as 
those  derived  from  thf  Unified  .Tri-services  Cognitive  Performance  Test  Battery 
(UTCPAB;  2-3),  that  would  allow  users  to  transform  data  from  performance  tests 
into  detailed  predictions  about  performance  in  operational  systems.  Such  pre¬ 
dictions  might  be  performed  by  first  analysing  the  temporal  organization  of 
performance  in  a  target  operational  system  into  elements  and  using  these  ele¬ 
ments  to  build  a  model  of  the  system.  Test  information  might  then  be  trans¬ 
ferred  between  performance  and  operational  Models  when  an  element  is  common  to 
both  (and  when  the  information  processing  requirements  and  other  contingencies 
of  the  system  and  the  performance  test  are  similar).  The  simplest  example  of 
such  a  transfer  would  occur  when  a  parameter  of  an  operational  model  element 
is  set  equal  to  its  value  in  the  corresponding  test  model.  Dynamic  examples 
would  occur  when  an  operational  model  parameter  is  caused  to  track  changes  in 
the  corresponding  test  model  parameter  that  occur  as  functions  of  other 
variables,  such  as  time. 


Work  originally  planned  for  Fiscal  Year  1988  included  developing  a  task- 
analytic  model  of  performance  in  a  helicopter  simulator.  This  modi!  was  to 
have  been  merged  with  subsidiary  models  oil  the  biological  effects  of 
antihistamines  and  used  to  predict  the  effects  of  antihistamines  on 
performance  in  helicopter  and  (in  a  second  effort)  naval-tactical  flight 
simulators.  The  work  was  originally  to  have  been  a  collaborative  effort 
involving  at  least  three  research  projects  from  two  different  laboratories. 
Various  factors  combined  to  render  that  work  unsuccessful.  Two  important 
contributing  factors  were  personnel  reansignments  and  difficulties  encountered 
in  meshing  the  logistics,  instrumentation,  and  milestone  schedules  of  the 
different  projects.  In  Fiscal  Year  ,989,  wo  focused  the  project  on  the 
narrower  topic  of  developing  technique®  for  modeling  laboratory  tests  of  human 
performance  (see  references  4-6).  Tnia  allowed  us  to  examine  more  adequately 
some  questions  regarding  how  performance  test  data  might  actually  be  inte¬ 
grated  into  models  of  operational  tasks.1 

The  perfothnance  teat  models  we  have  developed  are  driven  by  equations 
derived  from  empirical  data.  They  were  written  in  MicroSAINT,  which  is  a 
task-simulction  language  that  runs  on  personal  computers.  MicroSAINT  is  de¬ 
rived  from  the  System  Analysis  of  Integrated  Networks  of  Tasks  (SAINT,  refer- 
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ence  7).  SAINT  la  «  computer- simulation  language  that  runs  on  mainframe  com¬ 
puters;  it  way  developed  for  writing  network  performance  models  of  the  type 
introduced  in  human  engineering  during  the  1960s  by  Siegel  and  Wolf  (8). 

Many  of  the  performance  models  used  in  human  engineering  today  apperr  to 
be  derived  from  the  Siegel-Wolf  network  approach.  Models  of  this  type  diftwr 
substantially  from  the  traditional  control-theoretic  and  optimal-control 
models  of  human  engineering.  Control-theoretic  models  have  typically  used 
closed-loop  stability  analysis  to  generate  functions  describing  the  perfor¬ 
mance  of  man-machine  system  operators.  The  tasks  most  frequently  addressed  by 
such  models  are  continuous,  manual-control  tasks.  Optimal  control  models 
represent  the  performance  of  optimum  (ideal)  controllers  in  tasks  that  also 
are  usually  continuous,  manual-control  tasks.  In  an  optimal  control  model, 
the  simulated  controller  observes  representations  of  a  system's  state  varia¬ 
bles  (corrupted  by  sensory-system  noise)  and  generates  control  responses  (cor¬ 
rupted  by  motor-system  noise)  that  minimize  various  error  and  cost  criteria 
(for  a  review,  see  reference  9). 

In  contrast,  network  models  developed  in  the  Siegel-Wolf  tradition 
usually  represent  operator  tasks  as  organized  sets  of  discrete  subtasks. 
Typically,  the  representation  of  a  complex  task  comprises  a  description  of 
each  of  its  subtasks  and  their  organization.  This  description  usually 
includes:  (1)  the  conditions  that  must  obtain  before  the  subtask  can  begin, 
(2)  the  conditions  obtaining  at  the  end  of  the  subtask,.  (3)  the  expected 
duration  of  the  subtask  (and  the  variability  of  its  duration),  and  (4)  the 
probability  of  successfully  completing  the  subtask. 

Control- theoretic  and  optimal-contrcl  models  lend  themselves  most  natur¬ 
ally  to  the  description  of  continuous  tasks.  Thsir  application,  however,  has 
not  been  limited  to  continuous  tasks.  An  example  is  the  Procedure-Oriented 
Crew  Model  (PROCRU,  reference  10).  The  PROCRU  modal  originated  as  a  control- 
theory  based  model  of  the  approach-to-landing  stage  of  fiight  in  a  commercial 
airliner.  It  contains  submodels  describing  flight  control,  display  monitor¬ 
ing,  communicating  with  air  traffic  controllers,  and  other  flight  activities. 
Similarly,  although  network  models  lend  themselves  most  naturally  to  discrete 
tasks,  their  application  has  not  been  limited  to  discrete  tasks.  An  example 
is  the  network  model  of  the  LHX  helicopter  developed  in  MicroSAlNT  by 
Laughery,  Drews,  Archer,  and  Krarnme  (11).  One  of  the  outputs  of  this  model  is 
a  continuous  variable  whose  value  is  an  estimate  of  instantaneous  operator 
workload  during  the  course  of  a  mission. 

Because  the  psychometric  models  developed  under  this  project  follow  a 
common  plan  and  are  written  in  a  standard  language,  they  are  substantially 
easier  to  use  than  most  computational  performance  models.  Simulations  can  be 
specified,  run,  and  analyzed  using  MicroSAlNT' s  standard  collection  of  menu- 
driven  utilities.  Thus,  variables  can  be  altered  at  the  MicroSAlNT  Simulation 
Scenario  menu.  Data  to  be  saved  can  be  specified  at  the  MicroSAlNT  Snapshots 
of  Execution  menu.  Simulations  can  be  run  from  the  MicroSAlNT  Model  Execution 
menu.  Finally,  data  can  be  analyzed  from  the  MicroSAlNT  Analysis  of  Results 
menu. 


2 .  METHODS 

The  performance  assessment  test  models  we  have  developed  follow  the  plan 
of  the  UTCPAB  Generic  Task.  The  Generic  Task  is  a  general  model  of  the  tem¬ 
poral  organization  of  most  of  the  tests  of  the  UTCPAB.  It  also  is  as  the 
basic  plan  followed  by  the  computer  programs  of  the  UTCPAB  Authoring  Syetera — 
the  set  of  computer  routines  that  make  up  the  tests  of  the  UTCPAB.  Thus  the 
models  have  the  same  temporal  structure  as  the  tests  themselves.  They  repre¬ 
sent  the  trial-by-trial  temporal  organization  of  behavior  in  the  tests — the 
tests'  performance  structures. 
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EMPIRICAL  PERFORMANCE  DATA 


We  obtained  estimates  of  the  models'  human-performance  parameters  from 
data  provide  by  D.  L.  Reeves  of  the  Naval  Aerospace  Medical  Research  Labora¬ 
tory.  The  subjects  were  28  male  Naval  and  Marine  Aviation  Candidates.  The 
data  were  obtained  in  a  session  comprised  of  four  repetitions  of  a  battery  of 
tests  drawn  from  the  Walter  Reed  Performance  Assessment  Battery  (WRAIRPAB; 
12).  An  examination  of  the  data  indicated  no  significant  change  in  the 
subjects'  average  performance  across  these  sessions ,  so  we  derived  our 
parameter  estimates  from  all  four  repetitions  of  the  tests.  In  general,  the 
subjects'  responses  were  sorted  by  correctness  and  reaction  time  (RT).  The 
data  were  used  to  estimate  the  overall  proportions  of  correct  and  incorrect 
responses  /P/cl  and  P/ll .  respectively),  the  average  correct-  and  incorrect- 
response  RTc  /RT /cl  and  RT/ll .  respectively)  and  the  standard  deviations  of 
correct  and  incorrect  single-trial  RTs  /SD/rt/cl 1  and  SD(rt/ll  1 . 
respectively) . 


SIMULATION  PROCEDURES 

MicroSAlNT  supplies  gamma,  normal,  uniform,  exponential,  and  Poisson 
random  number  generators.  Of  these,  the  exponential,  Poisson,  and  gamma  are 
skewed  like  empirical  RT  distributions.  The  exponential  distribution,  which 
is  a  special  case  of  the  gamma,  yields  only  crude  approximations  to  the  shapes 
of  empirical  RT  distributions.  Both  the  Poisson  and  gamma  resemble  rt  distri¬ 
butions  qualitatively.  The  gamma  distribution,  however,  applies  more  natur¬ 
ally  than  the  Poisson  to  temporal  variables  (13).  (The  Poisson  describes 
counts  of  exponentially-distributed  variables.)  The  gamma  also  has  two  para¬ 
meters  v.  the  Poisson's  one,  which  sometimes  makes  the  gamma  easier  to  fit. 
Based  on  these  considerations,  we  used  MicroSAlNT's  gamma-distributed  random 
number  generator  to  simulate  RT  in  most  of  our  models.  This  decision  was  made 
for  the  purpose  of  accurately  describing  the  empirical  data.  We  do  not  mean 
to  suggest  that  gamma-distributed  RTs  necessarily  follow  from  a  theory  of 
mental  arithmetic  (indeed,  the  data  suggest  otherwise). 

Sequences  of  correct  and  incorrect  responses,  were  simulated  by  treating 
responses  as  Bernoulli  trials.  Thus,  the  models  generate  RTs  by  drawing  from 
simulated  correct-response  RT  distributions  on  a  randomly-determined 
100/p/cl 1%  of  all  trials.  Similarly,  the  models  draw  RTs  from  a  simulated, 
incorrect-response  RT  distribution  on  a  random  100/1-p/cl 1%  of  all  trials. 

The  first-approximation  models  draw  correct-response  RTs  from  one  probability 
distribution  with  a  mean  of  RT/cl  and  a  standard  deviation  of  SD / rt / c 1 1 ,  and 
draws  incorrect-response  RTs  from  a  second  probability  distribution  with  a 
mean  of  RT/ll  and  SD/rt Zll  1 .  (We  will  see,  presently,  that  this  strategy  does 
not  always  work. ) 


3.  RESULTS  AND  DISCUSSION 

Figure  1  contains  an  example  data  set  comprised  of  the  overall  RT  histo¬ 
grams  for  correct  and  incorrect  responses  in  the  Serial  Addition  and  Subtrac¬ 
tion  (SAS)  test  of  the  WRAIRPAB.  (A  fuller  treatment  of  the  data  can  be  found 
in  reference  14).  The  RT  histograms  were  collapsed  across  subjects,  test 
repetitions,  and  trials  within  repetitions.  Several  properties  of  the  SAS  RT 
distributions  should  be  noted.  First,  the  histograms  have  the  positively 
skewed  appearance  typical  of  most  RT  distributions  (13).  Second,  correct 
responses  occur  more  frequently  than  errors  /p/ll  »  0.02  v.  p/cl  *  0.98). 
Third,  correct-response  reaction  times  are  shorter,  on  average,  than  incor¬ 
rect-response  reaction  times  / RT / c 1  »  876.94  ms  v.  RT/ll  ■  1532.34  ms). 

Fourth,  the  variability  of  the  correct-response  reaction  times  is  less  than 
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the  variability  of  the  incorrect-response  reaction  timer  <SD(rtfcl)  «*  632.21 
v.  SDfrtm \  -  1153.27).  These  are  all  standard  results. 

CORRECT  INCORRECT 


RT  (s)  RT  (s) 

Figure  1.  Correct-  and  incorrect- response  raactJ.on  time  (RT)  distri¬ 
butions  in  Sarial  Addition  and  Subtraction. 


Figure  2  illustrates  observed  and  predicted  correct-response  RT  dis¬ 
tributions  in  Serial  Addition  and 
Subtraction.  The  function  labeled 
'Observed*  is  the  empirical  correct- 
response  RT  distribution.  The  func-  1  bUU 

tion  labeled  "Full  Oata  set"  is  a 
gamma  distribution  with  parameters 
(mean  and  variance)  equalling  the 
mean  and  variance  of  the  empirical  RT 
distribution.  The  correspondence  is 


not  especially  closes  the  distri¬ 
bution  of  empirical  RTs  is  much  more 
peaked  than  the  corresponding  gamma 
distribution.  A  goodness-of-fit  test 
using  intervals  containing  expected 
frequencies  of  5  or  more  yielded  a 
Chi-square  of  772.95  (si£  ■  lfi,  p  < 
0.005)/  which  clearly  allows  us  to 
reject  the  hypothesis  that  the  empir¬ 
ical  RTs  arose  from  a  gamma 
distribution  with  the  same  mean  ar.d 
variance  as  the  data. 


COUNT 


RT  (s) 


Figure  2.  Observed  \nd  predicted 
correct-response  reaction  time  (RT) 
distributions  tor  Serial  Addition  and 
subtraction. 


If  the  data  in  the  tail  of  the  distributions  for  serial  Addition  and 
empirical  RT  distribution  are  ignored  subtraction. 
the  fit  of  the  gamma  to  the  empirical 
distribution  is  visibly  improved. 

This  is  illustrated  by  the  function  labeled  'RTs  <  2000  msr '  which  is  the 
gamma  distribution  with  the  same  mean  and  variance  as  the  subset  of  correct 
responses  with  RTs  less  than  2000  ms.  The  mean  and  variance  of  this  distri¬ 
bution  are  79T  235  me  and  143587  ms2,  respectively.  A  test  of  this  distribu¬ 
tion's  goodness  of  fit  also  fails.  The  failure  is  somewhat  less  spectacular 
than  before.  A  test  calculated  using  the  intervals  with  expected  frequencies 


5f  S  or  more  yields  a  Chi-square  of 
597.21  -  9,  E  <  0.005).  In  this 

case,  most  of  the  discrepancy  can  be 
attributed  to  RTs  in  range  of  1.5- 
2.5  s.  In  this  region,  the  ordinate 
of  the  predicted  curve  falls  well 
below  that  of  the  observed 
distribution  (see  the  figure). 

Despite  this  result,  the  observed  and 
predicted  RT  counts  (in  the  inter¬ 
vals  with  more  than  5  expected  RTs) 
yield  a  highly  respectable  correla¬ 
tion  (£  •  0.9764). 


Figure  3  contains  the  observed 
and  predicted  incorrect-response  RT 
distributions.  In  this  case,  a  gamma 
distribution  with  swan  and  variance 
equal  to  those  of  the  empirical 
incorrect-response  RT  approximates 
the  empirical  RT  distribution  well. 

The  goodness-of-fit  calculation, 
again  based  on  intervals  with  more 
than  S  expected  RTs,  yields  a  Chi- 
square  of  4.94  (df  -  3,  p  <  0.25).  The  correlation  between  the  observed  and 
predicted  RT-counts  in  those  intervals  again  is  quite  high  (£  -  0.9473). 


4.  CONCLUSIONS 

Models  are  abstract  representation*  of  systems.  A  model  of  a  system 
consists  of  a  set  of  important  system  variables  and  a  set  of  relations  among 
them.  Models  can  be  useful  because  they  are  compact  relative  to  the  systems 
they  describe,  and  beaause  they  can  be  uiiied  to  predict  some  of  the  effects  of 
variation  in  system  variables.  A  map,  for  example,  is  useful  because  it  is 
more  compact  than  the  geography  it  describes  and  because  it  can  be  used  to 
predict  soma  of  the  consequences  of  changes  in  latitude  and  longitude. 

The  models  we  have  described  here  and  elsewhere  are  sequential-network 
designs;  they  are  essentially  task-analytic  in  nature.  We  think  that  modeling 
operational  tasks  in  this  fashion  clearly  represents  an  improvement  in  the 
quantitative  description  of  human  performance  in  operational  systems.  We  also 
submit  that  computational  models  can  also  improve  the  quantitative  descrip¬ 
tion  of  performance  in  laboratory  tests.  This  is  partly  because  it  is  possi¬ 
ble  to  develop  models  that  retain  the  statistical  properties  of  behavior  that 
summary  measures  discard.  Our  models  could,  in  fact,  be  expressed  as  equa¬ 
tions.  In  part,  this  is  because  we  have  approximated  the  empirical  perfor¬ 
mance  data  with  probability  distributions  whose  alcebraic  properties  are  well 
understood.  We  selected  these  distributions  for  reasons  of  computational 
efficiency.  The  penalty  incurred  was  a  loss  of  accuracy.  Greater  accuracy 
could  be  achieved,  for  example,  by  smoothing  the  empirical  react ion- time  his¬ 
tograms  and  sampling  from  the  distributions  thereby  produced.  Such  nonpara- 
metric  approaches  to  building  models  often  produce  results  that  are  difficult 
or  impossible  to  derive  mathematically.  Models  based  on  theoretical  consider¬ 
ations  that  are  not  easily  relaiad  to  well-developed  bodies  of  statistical 
theory  encounter  similar  problems  (consider,  for  example,  the  difficulty  of 
predicting  the  performance  of  neural  networks).  In  such  cases,  computational 
procedures  are  often  the  only  practical  means  of  examining  a  probxem. 

An  important  question  is  whether  a  laboratory  test  that  differs  sub¬ 
stantially  from  an  operational  behavior  of  interest  can  ever  yield  accurate 
predictions  of  raal-life  behavior.  For  example,  to  demonstrate  that  a 


COUNT 


Figure  3.  Observed  end  predicted 
incorrect-response  reection  tine  (RT) 
distributions  for  Serial  Addition  and 
Subtraction. 


5 


stressor  affects  human  parformanca  in  an  operational  system  requires  one  to 
show  that  the  stressor  changes  the  normal  pattern  of  relations  among  the 
system,  its  operator,  and  the  environment.  X  fairly  direct  approach  to  per¬ 
forming  such  a  demonstration  involves  examining  the  stressor's  effects  in  a 
hardware  simulator  (a  flight  simulator,  for  example) .  Simulator  research, 
however,  is  slow  and  costly.  Abstract,  laboratory  tests  are  faster,  more  eco¬ 
nomical.  If  proper ly  carried  out,  laboratory  tests  should  also  produce  more 
reliable  results  because  more  observations  can  be  obtained  at  the  same  cost. 
Laboratory  tasks,  however,  do  not  look  like  operational  tasks.  Consequently, 
they  are  often  regarded  with  suspicion. 

The  only  way  to  demonstrate  empirically  that  performance  on  an  abstract 
test  prelicts  a  variable's  effects  on  performance  in  an  operational  system  is 
tot  (1)  measure  the  effect  of  the  variable  on  the  test,  (2)  measure  the 
effect  of  the  variable  on  system  performance,  and  then  (3)  show  that  these 
effects  covary.  However,  this  necessarily  more  than  simply  measuring  the 
effect  of  the  variable  on  operational  performance.  Thus,  to  justify  the 
economics  of  such  an  enterprise,  one  must  be  able  to  say  that  any  association 
found  is  reasonably  likely  to  generalise  to  new  tasks  or  new  forms  of 
operational  performance.  Assarting  that  a  result  will  generalise,  however, 
requires  a  separate  appeal  to  theory  or  to  a  body  of  empirical  evidence. 

In  principle,  computational  procedures  can  be  used  to  amplify  the  infor¬ 
mation  derived  from  the  type  of  study  just  described.  In  par.icular,  these 
techniques  are  useful  for  deriving  predictions  for  new  scenarios.  This  is 
exactly  like  deriving  new  predictions  from  theory.  The  process  of  deriving 
implications  and  then  confirming  or  disconfirming  them  empirically  is  the 
pattern  followed  in  the  development  of  any  body  of  scientific  theory.  Because 
computational  techniques  can  accelerate  the  process  of  deriving  predictions, 
they  can  i-^prove  the  efficiency  of  experimentation!  a  well-designed  simula¬ 
tion  can  rapidly  explore  the  variable  space  of  a  theory  for  regions  where  its 
predictions  are  clearest.  With  this  information,  experiments  can  be  optimized 
to  provide  strong  tests  of  the  theory  by  concentrating  observations  where  they 
will  do  the  most  good.  In  this  way,  computational  procedures  can  increase  the 
rate  at  which  useful  infc^aation  is  acquired  and,  thereby,  increase  the  range 
of  phenomena  that  can  be  explored  in  a  given  amount  of  time. 
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